DOCUMENT RESUME 



ED 041 732 



SE 008 344 



AUTHOR 

TITLE 

INSTITUTION 
PUB DATE 
NOTE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Harke, Douglas J. 

Hierarchical Analysis of the Randomized Multiple 
Choice Format, 

Purdue Univ. , Lafayette, Ind. 

Mar 70 

21p.; Paper presented at the Annual Meeting of the 
National Association for Research in Science 
Teaching (43rd, Minneapolis, Minne., March 5-8, 1970) 

EDPS Price MF-S0.25 HC-J1.15 

♦Evaluation, ♦Multiple Choice Tests, ♦Physics, 
Scientific Concepts, *Secondary School Science, 

♦Test Validity 
Hierarchical Analysis 



ABSTRACT 

This paper describes two quantitative methods of 
hierarchical analysis used to measure the similarity of sequences or 
hierarchies of steps on two test formats, the Randomized Multiple 
Choice (RMS) format, and the free-response problem test format. The 
investigator first used the Consistency Ratio method to 
quantitatively assess the transfer relationships for each set of 
dependency relation snips in the hierarchy of RMS items. Consistency 
ratios for six test problems were calculated. The results indicated 
that the responses of the subjects as a whole validated most of the 
dependency relationships within the hierarchies and that in most 
segments of the hierarchies the students probably were required to 
use the same problem solving skills and procedures on the RMC and 
free-response problems. The pattern analysis technique was also used 
to analyze the responses for the complete hierarchy on a subject by 
subject basis. Indices of agreement for the six RMC problems were 
calculated. They revealed the amount of similarity between sequences 
of physical and mathematical concepts used in solving a problem in 
the RMC and free-response formats. Bibliography. (LC) 



*PR 9 




rsi 

K\ 





i s. MMimm * maun. bncainn t mum 
oma onMunoi 



IMS NCUMW NAS KM KMOMICB EUCTIV AS MCflVM MOM IK 
KISM M OKAMZAIKM OHMAIIK II. NMIS Of VKW N OfWORS 
SWIM N Ml KCtSSAMV KNfSMI OIIKUl OfFKE 01 MUCATIOI 

fosmoi oi roucv. 

Hierarchical Analysis of the Randomised 
Multiple Choice Format 




Douglas J* Harke* 



Purdue University 



Many science Instructors are not willing to take advantage 
of the labor saving and high reliability features of Machine 
scored Multiple choice tests because of the dichotomous grading 
of multiple choice items* Questions frequently used on physics 
tests 9 for example 9 require several steps to solve and are usually 
graded on a contin u ous scale with the amount of credit awarded 
being proportional to the degree of correctness of the solution* 
The Randomised Multiple Choice (RMC) format was developed to 
facilitate the awarding of partial credit on machine grading of 
physics problems which require several steps to solve* This for- 
mat is basically similar to that suggested by Nedelsky. 2 * The 
problem is stated conventionally and it Is suggested to the 
student to write out a solution to this problem In the space 
provided below the ^statement of the problem* Below this space 
4re five or more multiple choice items which correspond to steps 
In a correct written solution* These multiple choice items have 
a random order and not the order in uhich the steps would occur 



♦Present address is State University College, Geneseo, 
^ New York 

> 

CO 

60 

Q 

O 

Ui 

ERJC 



2 



in a written solution* Ihis is to require the student to 
organise and use a sequence of physical and mathematical 
concepts similar to the sequence used in the solution of the 
same problem in free-response form* 

The student completes his written solution and uses it to 
aid him in answering the BMC items* The student's score on the 
problem would be the number of multiple choice items answered 
correctly which should correspond to the number of steps done 
correctly in the written solution* The student could mark his 
responses to the BMC items on any standard machine scoreable 
answer sheet* 

The following is an example of an BMC physics test problem* 

A car of mass 1500 kg* is travelling at 20 m/sec* when the brakec 
are suddenly applied* The car takes 20 sec* to come to a 
complete stop* If the brakes of the car dissipate energy at 
A he rate of 2400 watts to the air while the car is braking* what 
is the Increase in temperature of the brakes? Assume that the 
heat is absorbed only by the brakes* The brakes (specific 
heat = 0*10) have a total mass of 40 kg. for the four wheels* 

1 kcal* = 4200 j. 

1* Before the brakes are applied the car has a kinetic energy of 
A* 60,000 j. B* 600,000 j. C* 15,000 j. *0. 300,000 
B* 30,000 j. 

2* If T is the increase of temperature of the brakes, the 
correct expression for heat absorbed by the brakes is 
A. 40(4200) T kcal. **B. 40(0*10)(42C0) T kcal* 

♦C. 40(0.10) T kcal. D. 40 T kcal* E. 40 T kcal. 

(0.10) 4200(0*10) 



♦oorrect answer **altemate answer 



o 

ERIC 



3 



3. Which one of the following statements is true? 

A* The brakes are capable of dissipating more heat than is 
produced* 

B. The energy dissipated by the brakes is equal to that 
absorbed by the brakes* 

*C* The energy dissipated by the brakes is less than that 
absorbed by the brakes* 

D* The energy dissipated by the brakes is greater than that 
absorbed by the brakes* 

4. The Increase in temperature of the brakes is 

A. 63°C. *B. 15°C. C. 630°C. D. 8°C. **E. 0.0036°C. 

5* The amount of heat that is retained by the brakes and causes 
than to heat up is 

♦A* 60 kcal* B* 48,000 kcal* C. 15 heal* 

D. 15,000 kcal. **E. 252,000 kcal* 

Some of the BMC items can contain an alternate answer to pro- 
vide partial credit for a student who made a common error on an 

step but used the correct procedure through the remainder 
of his solution* The alternate answers are anticipated in advance 
and are the most common and logical errors that would be made by 
the students* 

When the BMC items are machine -scored, one answer key is 
submitted for the correct answers and one answer key is submitted 

for the alternate answers* The scoring machine prints two numbers 

£ 

on the student's card — one which indicates the total number of 
correct answers marked and one which indicates the total number of 
alternate answers marked* It should be noted that the BMC format 
can be graded, without special procedures, by commercially 



♦correct ar.s^er ♦♦alternate answer 



4 



« 



A 



available machines that are currently used by many educational 
institutions* Other flexible variations of the multiple choice 
format have been developed but comnerical scoring equipment 
had to be modified or new computer programs had to be written 
to grade each test* 

The free-response problem test is the most widely used 

test format in physics and was used as the standard for 

3 

comparison with the 19ft! format* Direct comparisons of scores 
on the BMC and free -response formats were used to determine 
the similarity of the relative level of performance on the two 
test formats* Ihese direct comparisons , using correlations, 
did not reveal whether the students used the same mental 
processes to solve the same problems on the tsrc test formats* 
Therefore an indirect approach using hierarchical analysis 
was employed in comparing sequences of mental skills used on 
the two test formats* 

Some of the objectives of any physics course are to teach 
the basic concepts in physics and then to give the student 
experience in using these concepts to solve problems in the 
physical world* This practice consists of working problems 
some of which require the applies Ion of one concept and some of 
which require the application of a series of concepts* A 
problem test in physics is usually given to measure the student's 
competence in selecting and applying the correct sequence of 




5 



physical concepts to solve given problems* For a given problem 
in free-response form there are very definite sequences in 
which certain concepts must be applied to solve the problem 
correctly. These sequences can be logically determined from 
knowledge of physics. 

Each application of a physical concept in a problem solution 
could be thought of as a step In the problem solution* The 
logically determined sequences of steps necessary to solve the 
problem correctly were organised into a hierarchy. Each BMC item 
corresponded to a step in the written solution to a free- 
response problem* Therefore the hierarchy of BMC items would be 
theoretically the same as the hierarchy of steps* 

If it can be shown that the sequences in which the 
physical concepts are used is the same on the BMC format as on 
the free-response format, then it is quite probable that the 
same mental processes were used in solving the same problems 
on the two test formats* Two quantitative methods of hierarch- 
ical analysis were used u> measure the similarity of sequencer 
or hierarchies of steps on the two test formats* 

The Consistency Ratio Method 

An analysis similar to the one used by Gagne^ was employed 
to quantitatively assess the transfer relationships for each 
set of dependency relationships in the hierarchy of BMC items* 



6 



A dependency relationship was defined as consisting of a given 
I9IC item and the BMC items upon which it was directly dependent* 

A transfer relationship was defined as the set of dichotomous 
responses for a given dependency relationship* 

From a knowledge of physics it is known that certain steps 
in a problem must be done correctly before the next step can 
be done correctly* On this basis it was decided that the 
following types of transfer relationships validate the hier- 
archy of BMC items* 

1* Mastery of both the upper level item and lower level 
items* 

2* Failure on the upper level item and failure on one or 
more of the lower level items* 

3 * Failure on the upper level item but mastery of all of 
the lower level items* 

Transfer relationships of the fourth type where the upper 
level item is mastered but one or more of the lower level items 
is failed are contrary to the assumptions underlying the 
hierarchy and tend to Invalidate it* 

The first type of transfer relationship would correspond to 
a situation in a written solution in which the student mastered 
a certain step and all of the previous steps on which this step 
was directly dependent* The second type of transfer relationship 
would correspond to the written polutlon situation in which the 
student was unable to master a certain step because he missed 



7 



one op more of the previous steps* The third type of transfer 
relationship would correspond to the *r~*+ten solution situation 
in which the student was unable to master a certain step but 
was able to master all of the previous steps* The fourth type 
of transfer relationship would not occur in a written solution 
because the student could not master a certain step without 
being able to master all previous steps. 

The number of each typo of transfer relationship for each 
dependency relationship in the hierarchy was tabulated* This 
procedure was repeated for each problem. Tabulation of the 
number of cases of each type of transfer relationship in each 
dependency relationship provided information about performance 
on each part of the hierarchy. Such an analysis could be 
useful for modification of the BMC items if the number of cases 
of the fourth type of transfer relationship rose to an un- 
desireable level. 

In order to provide a quantitative measure of validation 
of each dependency relationship in the hierarchy a consistency 
ratio was calculated. The consistency ratio was defined as 
the number of transfer relationships consistent with the hier- 
archy divided by the total number of transfer relationships. 

If I 9 Hy HI and IV are used to designate the number of cases 
of each type of transfer relationship 9 then the consistency 
ratio can be expressed algebraically as 



8 



Consistency Ratio = I * H + HI 

i + n + m + iv 

A consistency vatio of unity indicated perfect agreement with 
the hierarchy. 

A consistency ratio was calculated for each dependency 
relationship in each problem. A high consistency ratio would 
mean that in working an IftiC problem the students had to master 
all of the lower level items connected to an upper level item 
before they were able to master the upper level item as would 
bo expected on a free-response problem. High consistency 
ratios on all dependency relationships in the hierarchy would 
tend to validate the hierarchy. This would ggest that in 
working the BMC problems the students used very similar or 
identical sequences to those they would be expected to use if 
the problems were presented in free-response form. 

Results. The results of the transfer relationship analysis 
are given in Table 1 which shows that the consistency ratios 
ranged from 0.73 to 0.99 with an average value of 0.88. 

The AAAS Science -A Process Approach evaluation committee* 
suggested a consistency ratio equal to or greater than 0.90 for 
hierarchy validation. This suggested value of 0.90 was ap- 
parently for situations corresponding to free-response questions. 
Guessing on the BMC items could change transfer relationships 
from the second type to the fourth type and thus reduce the 
consistency ratio* Therefore the minimum consistency ratio 






9 



for each dependency relationship would depend on the number 
of transfer relationships of the second and fourth type# 

Consider * for example, the possible effect of guessing on 
the #12: #6, #11 dependency relationship in the second 

problem* There were 92+47 or 139 students who were unable 
to master one or more of the lower level items* random 
guessing approximately 35 of these students could have guessed 
the correct answer to the higher level item* This would reduce 
to 12 the number of cases contradicting the theoretical 
hierarchy, or the corrected-for-guessing consistency ratio 
would be 0*93* 

Some of the dependency relationships with a lesser number 
of transfer relationships of the second and fourth type would 
have a smaller messing correction on the consistency ratio* 
Consider, for example, the dependency relationship #9 s #7* 

#12 in the fourth problem* There were 31 + 45 or 76 students 
who were unable to master one or more of the lower level items* 
By random guessing approximately 15 of these students could 
have guessed the correct answer to the higher level item* 

This would reduce to 30 the number of cases contradicting the 
hierarchy, or the corrected-for-guessing consistency ratio 
would be 0*82* 

Some of the transfer relationships of the first type might 
have been the fourth type had it not been possible for the stu- 
dents to guess the answer to one or more of the lower level 




Uj 



O 

-P 



& 



>> 

o 

c 

0 ) 

-p 

CO 



CO 

e 

o 

o 







0T 





Table 1 (cont f d*) 



o 

3 

& 

o 

c 

5 

(0 

•H 

(0 

C 

o 

o 



H 

M 

H| 

+ 



W + 



> 

IH 



M 

M 

-I 



(V 

o 



ca ca 

CO O- 

• • 

o o 



o 

o\ 

• 

o 



00 H 

O CT\ 

• • 



O H 

Q\ ON 



* 

o 



04 


+ 


1 




















3 


u 






















(0 


© 




o\ 


o\ 


>r\ 


VO 


CA 


At 


vO 


At 


VO 


G 


•G 


© 


CA 


CM 


-d* 


i~« 




4-1 


*-i 


4-1 




O 


u> 






















jj 


5s 


a 




















<tf 
























H 
























2 


1 


+ 




















H 


U 






















JC 


a> 


>4 


va 


ca 


At 


*A 


CA 


00 




VO 


CA 


o 


•G 


& 


4-1 


ca 


4-1 


*-i 


CM 




CM 


CA 


(A 


qj 


bO 






















0) 


•H 

&J 


a 




















X 
























+> 
























•d 


1 


i 












































10 


© 




CM 


IV 


H 


CM 


a± 


Ov 


*A 


lA 


Ov 


<D 

m 


*6 


§ 


IT\ 


At 


ca 


00 


»A 


At 


VO 


At 




VJ 

© 

o 


W 


•> 

a 




















Ch 
























o 


+ 


+ 




















U 


*4 






















© 


© 




O 


C'- 


\0 


ca 


vO 


va 


H 


*-| 


00 


£ 


*s» 


© 


VO 


VA 


A- 




tv 


00 


*A 


vO 


O 


e 


bfi 


3 


















*-l 


£ 


•H 

« 


5 





















o« 

r»a -rj 

O J3 

C « 

© C 

■s 5 

© +> 

p< © 

& 



Cj CM 
H rl N 

=*= =*-- 



* 



or 



CM 

~ O 






& 



O 4-1 

H H 

=*= =#S 



$ « ft $ $ 



tH CVI 



§ 

o 

PU 



VT\ !/> \£) VO \D 



n 



o 

ERIC 



12 



items. However there was less possibility of this happening 
than the possibility rf guessing converting the second type 
to the fourth type of transfer relationship because the p-values 
of the lower level items were greater than the p-valuos of 
the upper level item. Ihe frequency of the first type of 
transfer relationship , which appeared low rather than inflated, 
provided further evidence that guessing did not change a 
great number of relationships from the fourth type to the 
first type. 

The effect of guessing could have changed some of the 
transfer relationships of the fourth type to the first type 
and thus increased the consistency ratio. The guessing correct- 
ion which could be applied to the second and fourth types of 
transfer relationship was probably an undercorrection. 

Therefore the amount of undercorrection would p robably be 
compensated 'or by the tendency of guessing to slightly reduce 
the frequency of the fourth type of transfer relationship. 

The value 0.85 was selected as the minimum consistency 
ratio necessary to validate a hierarchy in which the questions 
are in multiple choice fora. This was probably a fairly safe 
choice because it has been shown earlier that guessing could 
reduce the consistency raiio by as much 3 0.2. In most cases 
the reduction was probably closer to 0.1. 

Four of the six RMC problems used had average consistency 



13 



ratios above the value 0*85* One of the consistency ratios in 
the third problem war, below 0*85 which indicated a possible 
weak link in the hierarchy* All of the consistency ratios 
in the second and fourth problems were below 0*85* 

The overall results indicated that the responses of the 
group as a whole validated most of the dependency relationships 
within the hierarchies* The results were conclu ive on the 
three problems which had all of their consistency ratios above 
0*85* The average consistency ratio on the third problem* 
which had one consistency ratio below 0*85» was above 0*85* 
Therefore the results on the third problem probably also 
validated the proposed hierarchy* Two of the problems each 
had several low consistency ratios because it was found later 
that a two dimensional hierarchy was probably not an adequate 
representation of the interrelationships of the correct and 
alternate answers* 

If the average consistency ratio was an appropriate 
measure of the validity of the total hierarchy* then the 
experimental results indicated that the HMC item responses 
validated the hierarchies which were constructed on the basis 
of the interrelationships of the steps in a written solution* 

The high consistency ratios specifically showed that the 
relationships between certain BMC items were apparently the same 
as the relationships between certain steps in the written 



14 



solution. This indicated that at least in most segments of 
the hierarchies the students probably were required to use the 
same problem solving skills and procedures on the RMC and 
free-response problems. 

Pattern Analysis 

The consistency ratio method considered group responses 
as a whole on the complete hierarchy and group responses on 
segments of the hierarchy. The pattern analysis technique 
developed by Rimoldi and Grib^ was used to analyze the responses 
for the complete hierarchy on a subject by subject basis. 

Rimoldi and Grib developed a versatile method of pattern 
analysis to compare bivariate patterns in which a number of 
s ub jects respond to a number of items. The responses to the 
items may be dichotomous or may be assigned other numerical 
values. If the test vector of the subjects are combined, the 
matrix formed will have rows corresponding to subjects and 
columns corresponding to items. This observed matrix can be 
compared to an expected matrix using the technique of Grib 
and Rimoldi. The expected matrix can be derived from any 
model. The only restriction on the expected matrix is that if 
the patterns are being compared across subjects then the 
subject's total score on the expected pattern must equal his 
total score on the observed pattern. A similar restriction 




15 



would be placed on the columns if the patterns were compared 
across items* 

Once the expected pattern has been generated, weights are 
calculated for each cell of the expected pattern. The weights 
a^j for the cells containing ones are given by 




% is equal 
C j is equal 
The weights 



to the number of ones in the ith row* 



to the number of ones in the jth column* 
a^ for the cells containing zeros are given by 

, .'A 



is equal to the number of zeros in the ith row* 

Cj is equal to the number of zeros in the jth column* 

This method of assigning weights makes no a priori assumptions 
as to the pattern expected. 

The amount of agreement or correlation between two patterns 
is expressed by the index of agreement I . When patterns are 

cL 

compared across subjects, this index of agreement can be written 
as 




*t * m*t 




T is the sum of all 



*11 



ard a 



IJ' 




16 



is the sura of a^ ard a^ of the cells that are the same in 
the expected and observed patterns. 

nfa is the sum of a^ and a^ corresponding to minimum possible 
agreement* 

Certain patterns like those containing mostly zeros or mostly 
oner have less permutations than a pattern containing an 
equal number of ones and zeros* The possibility of disagree- 
ment increases with the number of permutations* The term 
therefore reduces the power of the patterns containing 

mostly ones or mostly zeros* 

The index t agreement varies between unity for perfect 

agreement and zero for no agreement* Grib and Riraoldi compared 

values of I» to the coefficients obtained using other pattern 
a 

analyses and found that I a gives conservative values* No 
significance test has yet been developed for the index of 
ag^oment* 

Hierarchical procedures could provide a method of invest- 
igating the responses to interconnected test items to determine 
whether the sequence of responses indicated or validated certain 

patterns* 

Only a finite number of sequences of dichotomous elements 
(1 for correct and 0 for incorrect) were possible according 
to the relationships within the hierarchy of HMC items* The 
st ud ents * response patterns on the KMC items within a problem 




17 



were examined to determine whether they corresponded to one 
of the expected patterns* If a student's response pattern did 
not correspond to one of the expected patterns » then the ex- 
pected pattern most closely resembling and having the same 
nun. er of each type of element as the observed pattern was 
chosen as the expected pattern for that student* These 
expected and observed response patterns were compared using the 
Rimoldi and Grib procedure* This procedure was carried out 
and the Index of agreement was computed for each of the six 
BMC problems* 

The Index of agreement was Interpreted as the amount of 
similarity between the sequences of physical concepts used in 
solving a free -response problem and the same problem presented 
In RMC format* 

The Indices of agreement for the six BMC problems used 
are given in Table 2* 

Table 2* Indices of Agreement 



Problem 


1 


2 


3 


4 


5 


6 


Index of Agreement 


0.789 


0*6^0 


0.812 


0.705 


0.939 


0.819 



The indices of agreement revealed the amount of similarity 
between sequences of physical and mathematical concepts used 
in solving a problem in the BMC and free-respcnse formats. 



18 



The lew Indices of agreement on the second and fourth problem 
may have been the result of the inadequacies of the hierarchical 
model chosen* 

The index of agreement could best be thought of as a 
correlation coefficient because the group was compared subject 
by subject on the expected and observed response patterns* 
Because this method reduced the contributions made by patterns 
with few possible permutations, the index of agreement probably 
was an underestimate of the correlation of the expected and 
observed response patterns* 

Ideally the index of agreement should have been unity 
but the error introduced by guessing reduced the maximum index 
of agreement attainable* There was no way of :juaat it atively 
assessing how much the index of agreement was reduced by 
guessing* 

The relatively large indices of agreement obtained 
Indicated that the individual student * s response patterns to 
the RMC items closely resembled the response patterns expected, 
on the basis of the interrelationships of steps in a written 
solution* This further indicated that the students* for the 
total problem, used the same sequences of mathematical and 
physical concepts on the RMC format that they would have used 
if the problems would have been presented in free-response form# 
Therefore it is very probable that in working the total problem 




19 



the students as individuala used the same problem solving 
skills and procedures on the RMC and free-response problems. 

Conclusions 

The results of the hierarchical analyses indicates that 
the students in solving the RMC problems used sequences of 
mathematical and physical concepts similar to the sequences 
they would have used if the same problems had been presented 
in free-response form. Since this indicated that the students 
proceeded through the steps of the RMC problem solution in the 
same order as they would in a written solution, it could be 
inferred that the students probably used the same mental proces- 
ses on the two test formats. This argument could be further 
strengthened by considering the following example— in a 
particular free-response problem solution the student might 
be required to apply two different physical concepts and then 
combine the results of these applications to arrive at the 
final answer. In this example, the student, in addition to 
knowledge of the physical concepts, was required to use 
cognitive skills similar to those of application and synthesis. 
Analysis of the students 9 response patterns to the RMC items 
indicated that they usually correctly answered the RMC items 
corresponding to application of the concepts before they could 
correctly answer the RMC item which required the synthesis of 



the two previous items. Therefore it is highly likely that 
the RMC problems required the same or very similar cognitive 
processes of problem solving skills as the free-response 
problems. 



21 



References 



1* AAAS Commission on Science Education. Science-A Process 
Approach: An Evaluation and its Application. Second Report. 
American Association for the Advancement of Science, 1%8. 

2. Gagne, R. M. The Acquisition of Knowledge. Psychological 
Review . 1962, 69, 355-365. 

3* Harke, D. J. Evaluation of the Randomized Multiple Choice 
Format. (Doctoral dissertation, Purdue University) Ann 
Arbor, Mich.: University Microfilms, 1969. 

4. Nedelsky, L. Evaluation of Essays by Objective Tests. 
Journal of General Education. 1953 » 7» 209. 

5* Rlmoldi, H. J. A. & Grib, T. F. Pattern Analysis. 

British Journal of Statistical Psychology . I960, 13* 

137-149. 




